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Abstract 

The artificial neural network approach is used for separation of signals from a single photon 7 and 
products of the vr'^, i], meson neutral decay channels on the basis of the data from the CMS 
electromagnetic calorimeter alone. Rejection values for the three types of mesons as a function 
of single photon selection efficiencies are obtained for two Barrel and one Endcap pseudorapidity 
regions and initial Et of 20, 40, 60 and 100 GeV . 



1. Introduction. 

In our previous papers it was proposed to use the direct photon production 

process based at the partonic level on the Compton-like QCD subprocess qg —>■ q + ^ 
for extracting the gluon distribution function f^{x, Q^) in a proton at the LHC. One 
of the main background sources, as was established in [|3|], is the photons produced in 
neutral decay channels of vr*^, r/ and mesons []. So, to obtain a clean sample of events 
for gluon distribution function determination with a low background contamination it 
is necessary to discriminate between the direct photon signal and the signal from the 
photons produced in the neutral decay channels of tt*^, 77 and mesons. 

Among other physical processes in which one needs to separate a single photon 
from the background photons one can note the 77 decay. Obtaining clean 

signals from 7's in this process would enhance an accuracy of the Higgs boson mass 
determination. 

2. Data simulation. 

There is a number of the CMS publications on photon and neutral pion discrimination 
(see [|], [§, [§). 

Information from the electromagnetic calorimeter (ECAL) crystal cells alone is 
used in this paper to extract a photon signal. ECAL cells were analyzed after perform- 
ing the digitization procedure 

The GEANT-based full detector simulation package CMSIM (version 121) for 
CMS was used. We carried out a set of simulation runs including: (a) four particle 
types 7 and vr*^, r/, mesons, which were forced to decay only via neutral channels 
(see Table |l] based on the PDG data [|]); (b) five Et values 20, 40, 60, 100 and 200 
GeV; (c) three pseudorapidity r] intervals |r/| < 0.4, 1.0 < 77 < 1.4 (two Barrel regions) 
and l.Q<7]< 2.4 (Endcap region). 

About 4000 single particle events were generated for the CMSIM simulation 
of each type. The information from the 5x5 ECAL crystal cells window (ECAL 
tower) with the most energetic cell in the center was used in the subsequent analysis 
based on the artificial neural network (ANN) approach. The application of a software- 
implemented neural network for pattern recognition and triggering tasks is well known. 
This study was carried out with the JETNET 3.0 package [| developed at CERN and the 
University of Lund [^. 



' Along with bremsstrahlung photons produced from a quark in the fundamental 2^2 QCD subpro- 
cess (see lQ], |]). 

^Special thanks to A. Nikitenko for his help with digitization routines. 

'it is available via anonymous ftp from thep.lu.se or from freehep.scri.fsu.edu. 
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Table 1: Decay modes of tt", 77 and K° mesons. 



Particle 


Br.(%) 


Decay mode 




98.8 






1.2 




V 


39.3 


77 




32.2 






23.0 






4.8 






68.6 






31.4 






88.8 


7r"'"7r~7r^ 




8.5 





3. Neural network architecture and input data. 

The feed forward ANN with 1 1 input and 5 hidden nodes in one hidden layer and with 
binary output was chosen for analysis, i.e. with the 1 1 - 5 - 1 architecture. "Feed 
forward" implies that information can only flow in one direction (from input to output) 
and the ANN output directly determines the probability that an event characterized by 
some input pattern vector X{xi,X2, ■■■,Xn) (n = 11 in our case) is from the signal 
class. 

The following data received from the ECAL tower were put on the 1 1 network 
input nodes (0th layer): 

1 — 9: data from the first nine crystal cells ordered with respect to energy E: E of 
the leading cell was assigned to the 1st input node, E of the next-to-leading cell to the 
2nd input node and so on. 

10, 11: Two "width" variables defined as: 

25 2 / n2 

E Ei (rji - rjcog) E Ei {(pi - (pcog) 

= — 25 ' "^"^ = — ^5 ■ 

1=1 i=l 

Here {rjcog, 4>cog) are the coordinates of the center of gravity of the ECAL tower consid- 
ered. It was established that variation of the crystal cell number from 7 to 12 practically 
does not change the network performance. 

To ensure convergence and stability the total number of training patterns must be 
significantly (at least 20 — 30 times) larger than the number of independent parameters 
of the network given by formula: 

Nind = {Nrn + iVon) ' ^ftn + + ^ot (2) 
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where is the number of input nodes; Non is the number of output nodes; Nfm is the 
number of nodes in a hidden single layer; N^t is the number of thresholds in a hidden 
single layer; Not is the number of output thresholds (here Non = Not = 1)- So, for our 
11-5-1 architecture we get Nind = 66. 



4. Training and testing of ANN. 

There are two stages in neural network (NN) analysis. The first is the training/learning 
of the network with samples of signal and background events and the second is testing 
stage using independent data sets. Learning is the process of adjusting Nind indepen- 
dent parameters in formula (^. The training starts with random weights values. After 
feeding the training input vector (see 1 1 variables in the previous section), the NN out- 
put O^^^ is calculated for every training pattern p and compared with the target value 
t'^'P\ which is 1 for single photon and otherwise. After Np events are presented to the 
network, the weights are updated by minimization of the mean squared error function 
E averaged over the number of training patterns: 

P p=i 

where O^^-* is the output value for a pattern p, t^^^ is the training target value for this 
pattern p, Np is the number of patterns (events) in the training sample per update of 
weights (here Np is equal to 10, the JETNET default value). 

This error is decreased during the network training procedure. Its behavior dur- 
ing training is shown in Fig. |l| (we see that it drops as 0. 1 17 0.096). Here "Number 
of epochs" is the number of training sessions (equal to 200 here). For each epoch the 
percentage of correctly classified events/patterns is calculated with respect to the neural 
network threshold Othr = 0.5, classifying the input as a "7-event" if the NN output 
O > 0.5 and as a "background (vr", 77, AT^) event" if the NN output O < 0.5. Below 
we shall call this criterion the "0.5-criterion" 0. About 3000 signal events/patterns 
(containing the ECAL data from single photons) and each type of the background 
events/patterns (containing the ECAL data from the multiphoton vr*^, ry, meson de- 
cays) were chosen for training stage, i.e. more than 90 patterns per weight. 

After the network was trained, a test procedure was implemented in which the 
events not used in the training were passed through the network. The sets of weights 
obtained after the neural network training with the 'y/ir^ samples were written to a file 
for every Et and pseudorapidity interval. Then, the the same set of weights read from 
the corresponding file was applied to test sets of the vr", rj, events which the network 
had never seen before to find a test (generalization) performance with respect to every 
type of input event set. 2000 signal and background events (about 1000 of each sort) 

''it should be noted that for practical applications various Othr values can be used (see Section 5). 
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Fig. 1: A dependence of a mean error per epoch on the epoch number during the training procedure 
(|r7|<0.4, = 40 GeV). 
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Fig. 2: A dependence of a separation probability on the epoch number for training 7r° sample and test 
samples of tt*^, r;, K^^ mesons (|j7| < 0.4, Et = 40 GeV) for the network output threshold Othr = 0.5. 
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were used at the generalization stage. The output provided for each event can be consid- 
ered as a probability that this event is either from the signal or the background sample. 
If the training is done correctly, the probability for an event to be a signal is high if the 
output O is close to 1 . And conversely, if the output O is close to 0, it is more likely 
to be a background event. The network performance with respect to the "0.5-criterion" 
as a function of the training and test epoch number (for each type of background) is 
presented in Fig. |^ One can see from the "training sample" plot (upper left comer) 
that the neural network performance becomes stable starting with the epoch number 
180-200. In Figs. ^ and ^ we show the neural network output for the test samples of 
7r°,?7,_fC° mesons with Et = 20,40,60,100 GeV (the \rj\ < 0.4 interval was taken 
as an example). One can see that the range of the network output values becomes nar- 
rower with growing Et and, consequently, signal and background event classes become 
less distinguishable. 

The Manhattan algorithm for weight updating was used at the training stage. In 
Table ^ we compare it with other popular in high energy physics updating algorithms 
with varying learning rate ry (Backpropagation, Langevin) and noise term a (Langevin) 
for a case of photons in two Barrel regions with Et = 40 GeV. 

Table 2: A dependence of the separation probability (%) using "0.5-criterion" on tlie method. Et = 
40 GeV , Barrel region. 



Method 


B ackpropagation 


Langevin 


Parameters: rj 
a 


1.0 


0.1 


0.01 


0.001 


1.0 


0.1-0.01 


0.01 










0.01 


0.01 


0.001 


|r/|<0.4 


51 


65 


67 


65 


50 


60 


66 


1.0<ry<1.4 


50 


64 


65 


65 


50 


60 


64 



We shall see that even the best results obtained with other algorithms (e.g. for \rj\ < 0.4 
we get 67% for the Backpropagation and 66% for Langevin algorithms) are by 3 — 4% 
worse than the results obtained with the Manhattan algorithm: 70% for \r]\ < 0.4 and 
68% for 1.0<?7< 1.4 (see Tables | and || of the next section). 

5. Description of the results. 

The discrimination powers for various types of test event samples with respect to the 
middle point criterion (i.e. Othr = 0.5) are presented in Tables ||-^for three pseudo- 
rapidity intervals and four Et values. 

We see first that for the both Barrel regions the 'y/ir^ (and ^/ K^) separation 
efficiencies drop as 75 - 79% (79 - 82%) to 60 - 61% (56 - 63%) while for 7/7? 
they practically do not change and remain as large as 80%. All separation efficiencies 
substantially decrease when we come to the Endcap region (Table ||). 
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Table 3: The separation probability (%) using "0.5-criterion". \ri\ <0.4. 



Particle 
type 




20 


40 


60 


100 




79 


70 


64 


60 


7] 


83(87) 


79(88) 


80(88) 


80(84) 


s 


82(84) 


75(79) 


71(73) 


63(66) 



Table 4: The separation probability (%) using "0.5-criterion". 1.0 < < 1.4. 



Particle 
type 




20 


40 


60 


100 




75 


68 


63 


61 


7] 


79(83) 


77(84) 


78(84) 


78(79) 




79(83) 


69(75) 


66(70) 


56(59) 



Table 5: The separation probability (%) using "0.5-criterion". 1.6 < < 2.4. 



Particle 
type 




20 


40 


60 


100 




63 


59 


56 


54 




72(77) 


74(76) 


66(68) 


63(70) 




65(70) 


59(58) 


54(53) 


51(51) 



Though the neutral decay channels of the rj meson, like those of Kg meson (see 
Table [T]), have, on the average, four photons, the letter meson has noticeably less dis- 
crimination powers with respect to single photons (especially with Et > 40 GeV) and 
from this point of view it is intermediate between t] and vr'^ mesons. This is due to a 
large difference (eight orders of magnitude) between the mean life times of the rj and 
Kg mesons. In Table ^ we present a percentage of the decayed Kg mesons up to the 
ECAL surface as a function of their Et and pseudorapidity. Thus, as the energy in- 
creases the Kg decay vertex becomes closer to the ECAL surface and for Et > 60 GeV 
the 'y/Kg and j/tt^ discrimination powers are close enough |. The same fact is re- 
flected in Fig. ^ where we plotted the normalized distribution of the number of events 
over the minimal number of crystal cells containing 80% of the ECAL tower energy 
for the initial particle (7, vr'^, r/, Kg) transverse energy Et = 40 GeV. To find this num- 
ber we summed cell energies E in decreasing order starting with the most energetic 
cell until the sum reached 80% of the tower energy. The reason why the result for Kg 

^ see also Figs. - ^ 
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({Ncell) = 2.9) is intermediate between rj {{Ncell) = 3.7) and 7r° {{Ncell) -- 
given above. 

Table 6: Percentage of the decayed K" mesons as a function of their Et and pseudorapidity rj 
neutral decay channels are allowed. 



Et (GeV) 


20 


40 


60 


100 


200 


|r?|<0.4 


74.5 


49.7 


37.5 


25.8 


13.7 


1.0<?7<1.4 


67.5 


46.7 


33.6 


21.1 


12.3 


1.6<?7<2.4 


52.2 


34.6 


24.8 


16.1 


7.8 



Bracketed figures in Tables ^-^ are the j/K^ (and 7/7?) separation probabilities 
which one would have if the neural network were trained with the ^/ (and 7/r/) 
samples. We see that differences in separation probabilities by the "0.5-criterion" (for 
as an example) between the case that the network is trained and tested with the 
7 / event samples and the case that the network is trained with ^/tt^ and tested with 
7 / events sample are within about 4 — 8% for all Et and pseudorapidity intervals 
considered. 

For various applications it is useful to find which rejection can be obtained for 
a given single photon selection efficiency. The respective "7 selection/meson rejec- 
tion" curves are shown in Figs. ^ - ^ (also for three pseudorapidity rj intervals). The 
solid, dotted and dashed lines correspond to rejections of the vr*^ , and rj meson 
multiphoton final states. The rejections are seen to gradually decrease with growing 
pseudorapidity. 

As we mentioned in Section 4, in practice various network output threshold val- 
ues Othr can be used to achieve better signal-to-background {S/ B) ratios at the cost 
of statistics loss. We varied the output discriminator Othr value from 0.4 to 0.7. The 
resulting S/ B (where S corresponds to the single photon 7 events and B to the back- 
ground neutral pion vr*^ events) ratios for all Et values and rj intervals considered are 
given in Tables ^- ^ One can see that the Signal/Background ratio grows with growing 
NN output threshold. For example, at Et'^''^ = 20 GeV/c and for \ri\ < 0.4 it grows 
from 2.67 to 6.30 and for the same pseudorapidity interval and at Et^''^ = 60 GeV/c 
it grows from 1.42 to 2.43 while Othr varies from 0.40 to 0.70. 



Table?: Signal(7)/Background(7r"). |77|<0.4. 



{GeV/c) 


NN output cut Othr 


0.40 


0.45 


0.50 


0.55 


0.60 


0.65 


0.70 


20 


2.67 


3.06 


3.53 


4.07 


4.65 


5.43 


6.30 


40 


1.87 


2.13 


2.38 


2.60 


2.84 


3.11 


3.47 


60 


1.42 


1.50 


1.58 


1.71 


1.90 


2.15 


2.43 


100 


1.23 


1.27 


1.32 


1.42 


1.60 


1.73 


1.95 
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Table 8: Signal(7)/Background(7r")- 1.0 < 77 < 1.4. 



{GeVjc) 


NN output cut Othr 


0.40 


0.45 


0.50 


0.55 


0.60 


0.65 


0.70 


20 


2.02 


2.38 


2.81 


3.37 


3.96 


4.65 


5.41 


40 


1.81 


2.03 


2.31 


2.54 


2.79 


3.02 


3.33 


60 


1.49 


1.59 


1.69 


1.93 


2.12 


2.28 


2.51 


100 


1.24 


1.51 


1.56 


1.63 


1.79 


1.97 


2.28 



Table 9: Signal(7)/Background(7r"). 1.6 < 77 < 2.4. 



(Gey/c) 


NN output cut Othr 


0.40 


0.45 


0.50 


0.55 


0.60 


0.65 


0.70 


20 


1.31 


1.52 


1.79 


2.00 


2.42 


2.74 


3.44 


40 


1.28 


1.32 


1.64 


1.83 


2.05 


2.24 


2.48 


60 


1.04 


1.05 


1.34 


1.56 


1.61 


1.84 





The errors for Sj B values in Tables 7 — 9 above are of order of 0. 10-0.20. 



6. Conclusion. 

One can make some concluding remarks on Figs. - ^ and Tables § - 

For example, with 50% and 75% of single photon events, we obtain S^'^^ /B^'^°^ 
ratios shown in Fig. ^ 

The results obtained here can be improved by additional training of the network 
with the border patterns, i.e. with events for which the NN output O is close to the 
7/7r*' border value 0.5 for two classes of events. But it would require at least 3-5 times 
larger statistics than considered here and, consequently, huge computing resources. 

The network performance appears to be very sensitive to the crystal cell size. The 
results obtained by the authors in parallel with for the Barrel region with old ECAL 
geometry (until 1998) with 0.0145 x 0.0145 cell size are much better than those given 
in the present paper ^ Not so impressive results of -k^, rj, meson rejections obtained 
after analyzing Endcap cells necessitates the use of a preshower in this region, perfect 
rejection powers for which after the analysis based on the ANN application were shown 
inij. 



^Taking 90% of single photons at Et — 40 GeV , for example, one could reach the tt" rejection 
efficiency equal to about 60% for the old geometry instead of about 30% for the new one used here. 
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Fig. 4: Neural network output for the test samples of tt , 1], mesons (dotted lines) and photon (solid 
line) (|77| < 0.4, Et = 20, 40 GeV). 
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Fig. 5: Neural network output for the test samples of tt , 1], mesons (dotted lines) and photon (solid 
line) (|77| < 0.4, Et = 60, 100 GeV). 
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Fig. 6: Normalized distribution of events number over the minimal number of crystal cells containing 
80% of ECAL tower energy. Et = 40 GeV, 1.0< |??| < 1.4. 
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Fig. 7: Single photon selection efficiency vs. rejection of 7r°, r], K° mesons for four Et values and the 
<0.4 interval. 
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Fig. 8: Single photon selection efficiency vs. rejection of 7r°, r], K° mesons for four Et values and the 
1.0<|77|< 1.4 interval. 
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Fig. 9: Single photon selection efficiency vs. rejection of 7r°, r], K° mesons for four Et values and the 
1.6 <|77|< 2.4 interval. 
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